{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "_cell_guid": "0b9ffbb2-8d33-eb0c-5a42-0733bdc3b74b",
    "_uuid": "e14f65723f5ac826104e861c45e58525740d8b2a"
   },
   "source": [
    "# Bikeshare数据集上的特征工程\n",
    "\n",
    "1、\t任务描述\n",
    "请在Capital Bikeshare （美国Washington, D.C.的一个共享单车公司）提供的自行车数据上进行回归分析。根据每天的天气信息，预测该天的单车共享骑行量。\n",
    "\n",
    "原始数据集地址：http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset\n",
    "1)\t文件说明\n",
    "day.csv: 按天计的单车共享次数（作业只需使用该文件）\n",
    "hour.csv: 按小时计的单车共享次数（无需理会）\n",
    "readme：数据说明文件\n",
    "\n",
    "2)\t字段说明\n",
    "Instant记录号\n",
    "Dteday：日期\n",
    "Season：季节（1=春天、2=夏天、3=秋天、4=冬天）\n",
    "yr：年份，(0: 2011, 1:2012)\n",
    "mnth：月份( 1 to 12)\n",
    "hr：小时 (0 to 23)  （只在hour.csv有，作业忽略此字段）\n",
    "holiday：是否是节假日\n",
    "weekday：星期中的哪天，取值为0～6\n",
    "workingday：是否工作日\n",
    "1=工作日 （是否为工作日，1为工作日，0为非周末或节假日\n",
    "weathersit：天气（1：晴天，多云 ",
    "2：雾天，阴天 ",
    "3：小雪，小雨 ",
    "4：大雨，大雪，大雾）\n",
    "temp：气温摄氏度\n",
    "atemp：体感温度\n",
    "hum：湿度\n",
    "windspeed：风速\n",
    "casual：非注册用户个数\n",
    "registered：注册用户个数\n",
    "cnt：给定日期（天）时间（每小时）总租车人数，响应变量y （cnt = casual + registered）\n",
    "\n",
    "casual、registered和cnt三个特征均为要预测的y，作业里只需对cnt进行预测"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导入必要的工具包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "_cell_guid": "2ba3154a-c2aa-3158-1984-63ad2c0c786a",
    "_uuid": "5eb696b95780825e94ddb49787f9fa339fc3833b",
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 数据读取及基本处理\n",
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 读入数据\n",
    "\n",
    "数据预处理对训练数据和测试数据需进行同样处理，因此将二者一起读入"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "_cell_guid": "21fa35be-878b-b4f2-ef6e-68dc070b8bfa",
    "_uuid": "73aee228226be55c0c8a6e4fcbf818c56cd94926",
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>instant</th>\n",
       "      <th>dteday</th>\n",
       "      <th>season</th>\n",
       "      <th>yr</th>\n",
       "      <th>mnth</th>\n",
       "      <th>holiday</th>\n",
       "      <th>weekday</th>\n",
       "      <th>workingday</th>\n",
       "      <th>weathersit</th>\n",
       "      <th>temp</th>\n",
       "      <th>atemp</th>\n",
       "      <th>hum</th>\n",
       "      <th>windspeed</th>\n",
       "      <th>casual</th>\n",
       "      <th>registered</th>\n",
       "      <th>cnt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2011-01-01</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0.344167</td>\n",
       "      <td>0.363625</td>\n",
       "      <td>0.805833</td>\n",
       "      <td>0.160446</td>\n",
       "      <td>331</td>\n",
       "      <td>654</td>\n",
       "      <td>985</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>2011-01-02</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0.363478</td>\n",
       "      <td>0.353739</td>\n",
       "      <td>0.696087</td>\n",
       "      <td>0.248539</td>\n",
       "      <td>131</td>\n",
       "      <td>670</td>\n",
       "      <td>801</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>2011-01-03</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.196364</td>\n",
       "      <td>0.189405</td>\n",
       "      <td>0.437273</td>\n",
       "      <td>0.248309</td>\n",
       "      <td>120</td>\n",
       "      <td>1229</td>\n",
       "      <td>1349</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>2011-01-04</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.200000</td>\n",
       "      <td>0.212122</td>\n",
       "      <td>0.590435</td>\n",
       "      <td>0.160296</td>\n",
       "      <td>108</td>\n",
       "      <td>1454</td>\n",
       "      <td>1562</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>2011-01-05</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.226957</td>\n",
       "      <td>0.229270</td>\n",
       "      <td>0.436957</td>\n",
       "      <td>0.186900</td>\n",
       "      <td>82</td>\n",
       "      <td>1518</td>\n",
       "      <td>1600</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   instant      dteday  season  yr  mnth  holiday  weekday  workingday  \\\n",
       "0        1  2011-01-01       1   0     1        0        6           0   \n",
       "1        2  2011-01-02       1   0     1        0        0           0   \n",
       "2        3  2011-01-03       1   0     1        0        1           1   \n",
       "3        4  2011-01-04       1   0     1        0        2           1   \n",
       "4        5  2011-01-05       1   0     1        0        3           1   \n",
       "\n",
       "   weathersit      temp     atemp       hum  windspeed  casual  registered  \\\n",
       "0           2  0.344167  0.363625  0.805833   0.160446     331         654   \n",
       "1           2  0.363478  0.353739  0.696087   0.248539     131         670   \n",
       "2           1  0.196364  0.189405  0.437273   0.248309     120        1229   \n",
       "3           1  0.200000  0.212122  0.590435   0.160296     108        1454   \n",
       "4           1  0.226957  0.229270  0.436957   0.186900      82        1518   \n",
       "\n",
       "    cnt  \n",
       "0   985  \n",
       "1   801  \n",
       "2  1349  \n",
       "3  1562  \n",
       "4  1600  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 读入数据\n",
    "train = pd.read_csv(\"day.csv\")\n",
    "train.head()\n",
    "#print(\"train : \" + str(train.shape))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 731 entries, 0 to 730\n",
      "Data columns (total 16 columns):\n",
      "instant       731 non-null int64\n",
      "dteday        731 non-null object\n",
      "season        731 non-null int64\n",
      "yr            731 non-null int64\n",
      "mnth          731 non-null int64\n",
      "holiday       731 non-null int64\n",
      "weekday       731 non-null int64\n",
      "workingday    731 non-null int64\n",
      "weathersit    731 non-null int64\n",
      "temp          731 non-null float64\n",
      "atemp         731 non-null float64\n",
      "hum           731 non-null float64\n",
      "windspeed     731 non-null float64\n",
      "casual        731 non-null int64\n",
      "registered    731 non-null int64\n",
      "cnt           731 non-null int64\n",
      "dtypes: float64(4), int64(11), object(1)\n",
      "memory usage: 91.4+ KB\n"
     ]
    }
   ],
   "source": [
    "#train.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "没有缺失数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 特征工程"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 类别型特征编码\n",
    "对类别型特征进行独热编码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>season_1</th>\n",
       "      <th>season_2</th>\n",
       "      <th>season_3</th>\n",
       "      <th>season_4</th>\n",
       "      <th>mnth_1</th>\n",
       "      <th>mnth_2</th>\n",
       "      <th>mnth_3</th>\n",
       "      <th>mnth_4</th>\n",
       "      <th>mnth_5</th>\n",
       "      <th>mnth_6</th>\n",
       "      <th>...</th>\n",
       "      <th>weathersit_1</th>\n",
       "      <th>weathersit_2</th>\n",
       "      <th>weathersit_3</th>\n",
       "      <th>weekday_0</th>\n",
       "      <th>weekday_1</th>\n",
       "      <th>weekday_2</th>\n",
       "      <th>weekday_3</th>\n",
       "      <th>weekday_4</th>\n",
       "      <th>weekday_5</th>\n",
       "      <th>weekday_6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 26 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   season_1  season_2  season_3  season_4  mnth_1  mnth_2  mnth_3  mnth_4  \\\n",
       "0         1         0         0         0       1       0       0       0   \n",
       "1         1         0         0         0       1       0       0       0   \n",
       "2         1         0         0         0       1       0       0       0   \n",
       "3         1         0         0         0       1       0       0       0   \n",
       "4         1         0         0         0       1       0       0       0   \n",
       "\n",
       "   mnth_5  mnth_6    ...      weathersit_1  weathersit_2  weathersit_3  \\\n",
       "0       0       0    ...                 0             1             0   \n",
       "1       0       0    ...                 0             1             0   \n",
       "2       0       0    ...                 1             0             0   \n",
       "3       0       0    ...                 1             0             0   \n",
       "4       0       0    ...                 1             0             0   \n",
       "\n",
       "   weekday_0  weekday_1  weekday_2  weekday_3  weekday_4  weekday_5  weekday_6  \n",
       "0          0          0          0          0          0          0          1  \n",
       "1          1          0          0          0          0          0          0  \n",
       "2          0          1          0          0          0          0          0  \n",
       "3          0          0          1          0          0          0          0  \n",
       "4          0          0          0          1          0          0          0  \n",
       "\n",
       "[5 rows x 26 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#对类别型特征，观察其取值范围及直方图\n",
    "categorical_features = ['season','mnth','weathersit','weekday']\n",
    "\n",
    "#数据类型变为object，才能被get_dummies处理\n",
    "for col in categorical_features:\n",
    "    train[col] = train[col].astype('object')\n",
    "    \n",
    "X_train_cat = train[categorical_features]\n",
    "X_train_cat = pd.get_dummies(X_train_cat)\n",
    "X_train_cat.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 数值型特征\n",
    "对数值型特征进行标准化/MinMaxScaler，去量纲"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>temp</th>\n",
       "      <th>atemp</th>\n",
       "      <th>hum</th>\n",
       "      <th>windspeed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.355170</td>\n",
       "      <td>0.373517</td>\n",
       "      <td>0.828620</td>\n",
       "      <td>0.284606</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.379232</td>\n",
       "      <td>0.360541</td>\n",
       "      <td>0.715771</td>\n",
       "      <td>0.466215</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.171000</td>\n",
       "      <td>0.144830</td>\n",
       "      <td>0.449638</td>\n",
       "      <td>0.465740</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.175530</td>\n",
       "      <td>0.174649</td>\n",
       "      <td>0.607131</td>\n",
       "      <td>0.284297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.209120</td>\n",
       "      <td>0.197158</td>\n",
       "      <td>0.449313</td>\n",
       "      <td>0.339143</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       temp     atemp       hum  windspeed\n",
       "0  0.355170  0.373517  0.828620   0.284606\n",
       "1  0.379232  0.360541  0.715771   0.466215\n",
       "2  0.171000  0.144830  0.449638   0.465740\n",
       "3  0.175530  0.174649  0.607131   0.284297\n",
       "4  0.209120  0.197158  0.449313   0.339143"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#数值型变量预处理，\n",
    "#感觉数据已经做过处理（取值都在0-1之间），这里用MinMaxScaler再处理一次\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "mn_X = MinMaxScaler()\n",
    "numerical_features = ['temp','atemp','hum','windspeed']\n",
    "temp = mn_X.fit_transform(train[numerical_features])\n",
    "\n",
    "X_train_num = pd.DataFrame(data=temp, columns=numerical_features, index =train.index)\n",
    "X_train_num.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "_cell_guid": "3685f20a-c332-7ab0-cced-03d639d28833",
    "_uuid": "39e8e6e7fa4b7f6e628be0bd6aa3b859cd624d2c",
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>season_1</th>\n",
       "      <th>season_2</th>\n",
       "      <th>season_3</th>\n",
       "      <th>season_4</th>\n",
       "      <th>mnth_1</th>\n",
       "      <th>mnth_2</th>\n",
       "      <th>mnth_3</th>\n",
       "      <th>mnth_4</th>\n",
       "      <th>mnth_5</th>\n",
       "      <th>mnth_6</th>\n",
       "      <th>...</th>\n",
       "      <th>weekday_3</th>\n",
       "      <th>weekday_4</th>\n",
       "      <th>weekday_5</th>\n",
       "      <th>weekday_6</th>\n",
       "      <th>temp</th>\n",
       "      <th>atemp</th>\n",
       "      <th>hum</th>\n",
       "      <th>windspeed</th>\n",
       "      <th>holiday</th>\n",
       "      <th>workingday</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.355170</td>\n",
       "      <td>0.373517</td>\n",
       "      <td>0.828620</td>\n",
       "      <td>0.284606</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.379232</td>\n",
       "      <td>0.360541</td>\n",
       "      <td>0.715771</td>\n",
       "      <td>0.466215</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.171000</td>\n",
       "      <td>0.144830</td>\n",
       "      <td>0.449638</td>\n",
       "      <td>0.465740</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.175530</td>\n",
       "      <td>0.174649</td>\n",
       "      <td>0.607131</td>\n",
       "      <td>0.284297</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.209120</td>\n",
       "      <td>0.197158</td>\n",
       "      <td>0.449313</td>\n",
       "      <td>0.339143</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 32 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   season_1  season_2  season_3  season_4  mnth_1  mnth_2  mnth_3  mnth_4  \\\n",
       "0         1         0         0         0       1       0       0       0   \n",
       "1         1         0         0         0       1       0       0       0   \n",
       "2         1         0         0         0       1       0       0       0   \n",
       "3         1         0         0         0       1       0       0       0   \n",
       "4         1         0         0         0       1       0       0       0   \n",
       "\n",
       "   mnth_5  mnth_6     ...      weekday_3  weekday_4  weekday_5  weekday_6  \\\n",
       "0       0       0     ...              0          0          0          1   \n",
       "1       0       0     ...              0          0          0          0   \n",
       "2       0       0     ...              0          0          0          0   \n",
       "3       0       0     ...              0          0          0          0   \n",
       "4       0       0     ...              1          0          0          0   \n",
       "\n",
       "       temp     atemp       hum  windspeed  holiday  workingday  \n",
       "0  0.355170  0.373517  0.828620   0.284606        0           0  \n",
       "1  0.379232  0.360541  0.715771   0.466215        0           0  \n",
       "2  0.171000  0.144830  0.449638   0.465740        0           1  \n",
       "3  0.175530  0.174649  0.607131   0.284297        0           1  \n",
       "4  0.209120  0.197158  0.449313   0.339143        0           1  \n",
       "\n",
       "[5 rows x 32 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Join categorical and numerical features\n",
    "X_train = pd.concat([X_train_cat, X_train_num, train['holiday'],  train['workingday']], axis = 1, ignore_index=False)\n",
    "X_train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>instant</th>\n",
       "      <th>season_1</th>\n",
       "      <th>season_2</th>\n",
       "      <th>season_3</th>\n",
       "      <th>season_4</th>\n",
       "      <th>mnth_1</th>\n",
       "      <th>mnth_2</th>\n",
       "      <th>mnth_3</th>\n",
       "      <th>mnth_4</th>\n",
       "      <th>mnth_5</th>\n",
       "      <th>...</th>\n",
       "      <th>weekday_5</th>\n",
       "      <th>weekday_6</th>\n",
       "      <th>temp</th>\n",
       "      <th>atemp</th>\n",
       "      <th>hum</th>\n",
       "      <th>windspeed</th>\n",
       "      <th>holiday</th>\n",
       "      <th>workingday</th>\n",
       "      <th>yr</th>\n",
       "      <th>cnt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.355170</td>\n",
       "      <td>0.373517</td>\n",
       "      <td>0.828620</td>\n",
       "      <td>0.284606</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>985</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.379232</td>\n",
       "      <td>0.360541</td>\n",
       "      <td>0.715771</td>\n",
       "      <td>0.466215</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>801</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.171000</td>\n",
       "      <td>0.144830</td>\n",
       "      <td>0.449638</td>\n",
       "      <td>0.465740</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1349</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.175530</td>\n",
       "      <td>0.174649</td>\n",
       "      <td>0.607131</td>\n",
       "      <td>0.284297</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1562</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.209120</td>\n",
       "      <td>0.197158</td>\n",
       "      <td>0.449313</td>\n",
       "      <td>0.339143</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1600</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 35 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   instant  season_1  season_2  season_3  season_4  mnth_1  mnth_2  mnth_3  \\\n",
       "0        1         1         0         0         0       1       0       0   \n",
       "1        2         1         0         0         0       1       0       0   \n",
       "2        3         1         0         0         0       1       0       0   \n",
       "3        4         1         0         0         0       1       0       0   \n",
       "4        5         1         0         0         0       1       0       0   \n",
       "\n",
       "   mnth_4  mnth_5  ...   weekday_5  weekday_6      temp     atemp       hum  \\\n",
       "0       0       0  ...           0          1  0.355170  0.373517  0.828620   \n",
       "1       0       0  ...           0          0  0.379232  0.360541  0.715771   \n",
       "2       0       0  ...           0          0  0.171000  0.144830  0.449638   \n",
       "3       0       0  ...           0          0  0.175530  0.174649  0.607131   \n",
       "4       0       0  ...           0          0  0.209120  0.197158  0.449313   \n",
       "\n",
       "   windspeed  holiday  workingday  yr   cnt  \n",
       "0   0.284606        0           0   0   985  \n",
       "1   0.466215        0           0   0   801  \n",
       "2   0.465740        0           1   0  1349  \n",
       "3   0.284297        0           1   0  1562  \n",
       "4   0.339143        0           1   0  1600  \n",
       "\n",
       "[5 rows x 35 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "FE_train = pd.concat([train['instant'], X_train,  train['yr'],train['cnt']], axis = 1)\n",
    "FE_train.to_csv('FE_day.csv', index=False)\n",
    "FE_train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 731 entries, 0 to 730\n",
      "Data columns (total 35 columns):\n",
      "instant         731 non-null int64\n",
      "season_1        731 non-null uint8\n",
      "season_2        731 non-null uint8\n",
      "season_3        731 non-null uint8\n",
      "season_4        731 non-null uint8\n",
      "mnth_1          731 non-null uint8\n",
      "mnth_2          731 non-null uint8\n",
      "mnth_3          731 non-null uint8\n",
      "mnth_4          731 non-null uint8\n",
      "mnth_5          731 non-null uint8\n",
      "mnth_6          731 non-null uint8\n",
      "mnth_7          731 non-null uint8\n",
      "mnth_8          731 non-null uint8\n",
      "mnth_9          731 non-null uint8\n",
      "mnth_10         731 non-null uint8\n",
      "mnth_11         731 non-null uint8\n",
      "mnth_12         731 non-null uint8\n",
      "weathersit_1    731 non-null uint8\n",
      "weathersit_2    731 non-null uint8\n",
      "weathersit_3    731 non-null uint8\n",
      "weekday_0       731 non-null uint8\n",
      "weekday_1       731 non-null uint8\n",
      "weekday_2       731 non-null uint8\n",
      "weekday_3       731 non-null uint8\n",
      "weekday_4       731 non-null uint8\n",
      "weekday_5       731 non-null uint8\n",
      "weekday_6       731 non-null uint8\n",
      "temp            731 non-null float64\n",
      "atemp           731 non-null float64\n",
      "hum             731 non-null float64\n",
      "windspeed       731 non-null float64\n",
      "holiday         731 non-null int64\n",
      "workingday      731 non-null int64\n",
      "yr              731 non-null int64\n",
      "cnt             731 non-null int64\n",
      "dtypes: float64(4), int64(5), uint8(26)\n",
      "memory usage: 70.0 KB\n"
     ]
    }
   ],
   "source": [
    "FE_train.info()"
   ]
  }
 ],
 "metadata": {
  "_change_revision": 0,
  "_is_fork": false,
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
